Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Force outlining for array EA API #21063

Merged
merged 1 commit into from
Feb 7, 2025

Conversation

amicic
Copy link
Contributor

@amicic amicic commented Feb 3, 2025

Macros for Array Effective Address calculations are inherently inlined
and seem to create much pressure either on register or code cash in
Bytecode interpreter.

They are rewritten in C and forced to be outlined specifically for GNU
compilers on X and Z, where we saw regression when Offheap was
introduced (what made the macros more complex, creating even more
pressure).

For other platforms where we did not see regression, we continue to
inline (ATM unknown if outlining would have negative or possitive
effect). Hence we still keep it in a header (*.h) file.

@amicic
Copy link
Contributor Author

amicic commented Feb 3, 2025

jenkins compile all jdk21

@amicic amicic force-pushed the array_access_macros branch from 115daa0 to 8659ab3 Compare February 3, 2025 23:10
@amicic
Copy link
Contributor Author

amicic commented Feb 4, 2025

jenkins compile all jdk21

@amicic amicic force-pushed the array_access_macros branch from 8659ab3 to 244e64e Compare February 4, 2025 19:39
@amicic
Copy link
Contributor Author

amicic commented Feb 4, 2025

jenkins compile all jdk8

@amicic amicic changed the title WIP: Force outlining for array EA API Force outlining for array EA API Feb 4, 2025
@amicic amicic force-pushed the array_access_macros branch from 244e64e to 9f7becc Compare February 4, 2025 21:50
@amicic
Copy link
Contributor Author

amicic commented Feb 4, 2025

please review: @dmitripivkine @TobiAjila @gacholio

@amicic
Copy link
Contributor Author

amicic commented Feb 4, 2025

The re-write is intentionally contained within 64bit and API that uses vmThread. If there are no negative side effects 32bit and javaVM APIs will be rewritten, too.

Copy link
Contributor

@tajila tajila left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks good. My only concern is that VMINLINE may not be as strong a guarantee of inlining as a macro in all compilers. I guess we will find out pretty soon.

@tajila
Copy link
Contributor

tajila commented Feb 5, 2025

jenkins test sanity,extended.functional alinux64 jdk17

@tajila
Copy link
Contributor

tajila commented Feb 6, 2025

Some failures

 stderr:
16:01:27  Unhandled exception
16:01:27  Type=Segmentation error vmState=0x00000000
16:01:27  J9Generic_Signal_Number=00000018 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000001
16:01:27  Handler1=0000FFFF82035034 Handler2=0000FFFF82697A40 InaccessibleAddress=00000001FFF3A87E
16:01:27  R0=0000FFFF62ECB820 R1=00000001FFF3A87E R2=0000000000000001 R3=0000000000000001
16:01:27  R4=00000001FFF3A87F R5=0000FFFF62ECB821 R6=0000000000000000 R7=0000000000000038
16:01:27  R8=000000000002F960 R9=0000FFFF2925CFC4 R10=0000000000000001 R11=0000000000000008
16:01:27  R12=0000000000000018 R13=0000000000000003 R14=0000000000000000 R15=0000000000000004
16:01:27  R16=0000FFFF822002E8 R17=0000FFFF82967910 R18=0000000000000000 R19=0000000000376400
16:01:27  R20=00000000004A9F38 R21=FFFFFFFFFFFFFFFE R22=00000000FFF3A878 R23=FFFFFFFFFFFFFFFF
16:01:27  R24=0000000000000001 R25=0000000000000004 R26=0000FFFF62ECB820 R27=00000000FFF3A880
16:01:27  R28=0000FFFF8209D9E0 R29=0000FFFF62ECB550 R30=0000FFFF820B5BD8 R31=0000FFFF62ECB540
16:01:27  PC=0000FFFF829677F0 SP=0000FFFF62ECB540 PSTATE=0000000080000000
16:01:27  V0=0000000000000000 (f: 0.000000, d: 0.000000e+00)
16:01:27  V1=ffffffffffffffff (f: 4294967296.000000, d: -nan)
16:01:27  V2=43eed4e6357f4219 (f: 897532416.000000, d: 1.777323e+19)
16:01:27  V3=00000000ffffffff (f: 4294967296.000000, d: 2.121996e-314)
16:01:27  V4=bfd00ea348b88334 (f: 1220051712.000000, d: -2.508934e-01)
16:01:27  V5=3fd5575b0be00b6a (f: 199232368.000000, d: 3.334568e-01)
16:01:27  V6=3fe62e42fefa39ef (f: 4277811712.000000, d: 6.931472e-01)
16:01:27  V7=4035000000000000 (f: 0.000000, d: 2.100000e+01)
16:01:27  V8=000003c10000745f (f: 29791.000000, d: 2.039238e-311)
16:01:27  V9=0000000000000000 (f: 0.000000, d: 0.000000e+00)
16:01:27  V10=0000000000000000 (f: 0.000000, d: 0.000000e+00)
16:01:27  V11=0000000000000000 (f: 0.000000, d: 0.000000e+00)
16:01:27  V12=0000000000000000 (f: 0.000000, d: 0.000000e+00)
16:01:27  V13=0000000000000000 (f: 0.000000, d: 0.000000e+00)
16:01:27  V14=0000000000000000 (f: 0.000000, d: 0.000000e+00)
16:01:27  V15=0000000000000000 (f: 0.000000, d: 0.000000e+00)
16:01:27  V16=401e840b4ac4e4d2 (f: 1254417664.000000, d: 7.628949e+00)
16:01:27  V17=00000000000003e8 (f: 1000.000000, d: 4.940656e-321)
16:01:27  V18=4000000000000000 (f: 0.000000, d: 2.000000e+00)
16:01:27  V19=3f9eb851eb851eb8 (f: 3951369984.000000, d: 3.000000e-02)
16:01:27  V20=3fb1eb851eb851ec (f: 515396064.000000, d: 7.000000e-02)
16:01:27  V21=0000000000000800 (f: 2048.000000, d: 1.011846e-320)
16:01:27  V22=3fc999999999999a (f: 2576980480.000000, d: 2.000000e-01)
16:01:27  V23=3fa999999999999a (f: 2576980480.000000, d: 5.000000e-02)
16:01:27  V24=3fd6666666666666 (f: 1717986944.000000, d: 3.500000e-01)
16:01:27  V25=0000000000000000 (f: 0.000000, d: 0.000000e+00)
16:01:27  V26=3fb999999999999a (f: 2576980480.000000, d: 1.000000e-01)
16:01:27  V27=000000000000000a (f: 10.000000, d: 4.940656e-323)
16:01:27  V28=0000000000000001 (f: 1.000000, d: 4.940656e-324)
16:01:27  V29=0000000000000000 (f: 0.000000, d: 0.000000e+00)
16:01:27  V30=7465472f736e6f69 (f: 1936617344.000000, d: 4.875044e+252)
16:01:27  V31=7079742f6e6f6974 (f: 1852795264.000000, d: 6.322810e+233)
16:01:27  Module=/lib64/libc.so.6
16:01:27  Module_base_address=0000FFFF828CC000
16:01:27  Target=2_90_20250205_952 (Linux 5.14.0-511.el9.aarch64)
16:01:27  CPU=aarch64 (4 logical CPUs) (0x1db0d8000 RAM)
16:01:27  ----------- Stack Backtrace -----------
16:01:27  __memcpy_generic+0x70 (0x0000FFFF829677F0 [libc.so.6+0x9b7f0])
16:01:27  bytecodeLoopCompressed+0x181f8 (0x0000FFFF820B5BD8 [libj9vm29.so+0xb5bd8])
16:01:27  c_cInterpreter+0x54 (0x0000FFFF82124090 [libj9vm29.so+0x124090])
16:01:27  ---------------------------------------

https://openj9-jenkins.osuosl.org/job/Test_openjdk17_j9_sanity.openjdk_aarch64_linux_Personal_testList_0/62/console

@amicic amicic force-pushed the array_access_macros branch from 9f7becc to 7131b4c Compare February 6, 2025 14:21
@amicic
Copy link
Contributor Author

amicic commented Feb 6, 2025

suspecting on index being passed as an expression at some spots - added oval brackets at macro expansion

@amicic
Copy link
Contributor Author

amicic commented Feb 6, 2025

jenkins test sanity,extended.functional alinux64 jdk17

@amicic
Copy link
Contributor Author

amicic commented Feb 6, 2025

still failing - looking

@amicic amicic force-pushed the array_access_macros branch 2 times, most recently from 907e5fd to 3f393ca Compare February 7, 2025 01:07
@amicic
Copy link
Contributor Author

amicic commented Feb 7, 2025

jenkins test sanity,extended.functional alinux64 jdk17

@amicic
Copy link
Contributor Author

amicic commented Feb 7, 2025

Seems like Unsafe code intentionally deals with index as UDATA, so it was incorrect to cast index down to U_32 at the top level API. Still doing it just for index recalculation for discontiguous arrays, for speed reasons (it does actually make a difference and it's also what original macro was doing).

Macros for Array Effective Address calculations are inherently inlined
and seem to create much pressure either on register or code cash in
Bytecode interpreter.

They are rewritten in C and forced to be outlined specifically for GNU
compilers on X and Z, where we saw regression when Offheap was
introduced (what made the macros more complex, creating even more
pressure).

For other platforms where we did not see regression, we continue to
inline (ATM unknown if outlining would have negative or possitive
effect). Hence we still keep it in a header (*.h) file.

Signed-off-by: Aleksandar Micic <[email protected]>
@amicic amicic force-pushed the array_access_macros branch from 3f393ca to a872cbd Compare February 7, 2025 01:11
@amicic
Copy link
Contributor Author

amicic commented Feb 7, 2025

jenkins test sanity,extended.functional alinux64 jdk17

@amicic
Copy link
Contributor Author

amicic commented Feb 7, 2025

jenkins compile all jdk8

@tajila tajila merged commit 9dd7e0e into eclipse-openj9:master Feb 7, 2025
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants